sub-goal generator
Optimistic Reinforcement Learning-Based Skill Insertions for Task and Motion Planning
Liu, Gaoyuan, de Winter, Joris, Durodie, Yuri, Steckelmacher, Denis, Nowe, Ann, Vanderborght, Bram
Abstract--T ask and motion planning (T AMP) for robotics manipulation necessitates long-horizon reasoning involving versatile actions and skills. While deterministic actions can be crafted by sampling or optimizing with certain constraints, planning actions with uncertainty, i.e., probabilistic actions, remains a challenge for T AMP . On the contrary, Reinforcement Learning (RL) excels in acquiring versatile, yet short-horizon, manipulation skills that are robust with uncertainties. Besides the policy, a RL skill is defined with data-driven logical components that enable the skill to be deployed by symbolic planning. A plan refinement sub-routine is designed to further tackle the inevitable effect uncertainties. In the experiments, we compare our method with baseline hierarchical planning from both T AMP and RL fields and illustrate the strength of the method. The results show that by embedding RL skills, we extend the capability of T AMP to domains with probabilistic skills, and improve the planning efficiency compared to the previous methods. Reinforcement Learning (RL) empowers robots to acquire manipulation skills without human programming. However, prior works mostly tackle single-skill or short-term manipulation tasks, such as grasping [1] or peg insertion [2] or synergies between two actions [3]. The long-horizon manipulation planning remains a challenge in the RL field because of expanding state/action spaces and sparse rewards etc [4].
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China (0.04)
Sub-goal Distillation: A Method to Improve Small Language Agents
Hashemzadeh, Maryam, Stengel-Eskin, Elias, Chandar, Sarath, Cote, Marc-Alexandre
While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational requirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Recently, Large Language Models (LLMs) have found applications in various fields, including multi-task learning, decision making, answering questions, summarizing documents, translating languages, completing sentences, and serving as search assistants. The promising advantage of LLMs is attributed to their training on extensive text datasets, resulting in impressive capabilities. This prior knowledge can be leveraged for action planning to solve tasks in robotics and reinforcement learning (Huang et al., 2022b; Brohan et al., 2023; Liang et al., 2023). However, the extreme size of LLMs makes them computationally unaffordable for many applications. Consequently, there is an increasing demand to find approaches that are less computationally intensive while still capitalizing on the knowledge embedded in LLMs. One prevalent technique is the use of Knowledge Distillation (KD) (Buciluǎ et al., 2006; Hinton et al., 2015), wherein a smaller model is trained with guidance from a larger model. Through this approach, we can leverage the knowledge in an LLM to train a more compact model with a reduced number of parameters. First, focus on the substance. Figure 1: Example of annotating an expert trajectory with sub-goals for a particular variation of task 1-4 We employ Knowledge Distillation from an LLM to train (change-the-state-of-matter-of).
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)